NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Prediction of nucleic acid binding residues in protein sequences: Recent advances and future prospects

Basu, Sushmita; Yang, Yuedong; Kurgan, Lukasz (October 2025, Diabetes metabolic syndrome)

Computational prediction of DNA-binding residues (DBRs) and the RNA-binding residues (RBRs) in protein sequences is an active area of research, with about 90 predictors and 20 that were published over the last two years. The new predictors rely on sophisticated deep neural networks and protein language models, produce accurate predictions, and are conveniently available as code and/or web servers. However, we identified shortage of tools that predict these interactions in intrinsically disordered regions and tools capable of predicting residues that interact with specific RNA and DNA types. Moreover, cross-predictions between RBRs and DBRs should be quantified and minimized to ensure that future tools accurately differentiate between these two distinct types of nucleic acids.
more » « less
Free, publicly-accessible full text available October 1, 2026
Empirical Assessment of Sequence-Based Predictions of Intrinsically Disordered Regions Involved in Phase Separation

https://doi.org/10.3390/biom15081079

Wu, Xuantai; Wang, Kui; Hu, Gang; Kurgan, Lukasz (August 2025, Biomolecules)

Phase separation processes facilitate the formation of membrane-less organelles and involve interactions within structured domains and intrinsically disordered regions (IDRs) in protein sequences. The literature suggests that the involvement of proteins in phase separation can be predicted from their sequences, leading to the development of over 30 computational predictors. We focused on intrinsic disorder due to its fundamental role in related diseases, and because recent analysis has shown that phase separation can be accurately predicted for structured proteins. We evaluated eight representative amino acid-level predictors of phase separation, capable of identifying phase-separating IDRs, using a well-annotated, low-similarity test dataset under two complementary evaluation scenarios. Several methods generate accurate predictions in the easier scenario that includes both structured and disordered sequences. However, we demonstrate that modern disorder predictors perform equally well in this scenario by effectively differentiating phase-separating IDRs from structured regions. In the second, more challenging scenario—considering only predictions in disordered regions—disorder predictors underperform, and most phase separation predictors produce only modestly accurate results. Moreover, some predictors are broadly biased to classify disordered residues as phase-separating, which results in low predictive performance in this scenario. Finally, we recommend PSPHunter as the most accurate tool for identifying phase-separating IDRs in both scenarios.
more » « less
Free, publicly-accessible full text available August 1, 2026
Comprehensive assessment of AlphaFold's predictions of secondary structure and solvent accessibility at the amino acid-level in eukaryotic, bacterial and archaeal proteins

Yu, Jing; Zhao, Bi; Kurgan, Lukasz (May 2025, Computational and Structural Biotechnology Journal)

Numerous sequence-based predictors of the amino acid (AA)-level solvent accessibility (SA) and secondary structure (SS) of proteins have been developed. We empirically investigated whether these two key characteristics of AA-level structure can be accurately predicted from putative structures generated by the popular AlphaFold2. We compared AlphaFold2's results against several representative SS and SA predictors on a large test dataset that covers five distinct taxonomic groups (animals, plants, fungi, bacteria, and archaea). We used a broad collection of metrics that evaluate predictions of the numeric and binary (buried vs. solvent exposed) SA and the 3-state SS at both AA- and SS-region levels. We found that AlphaFold2 generated very accurate results, with high average Q3 accuracy of 0.928 for the SS prediction and high Pearson Correlation Coefficient (PCC) of 0.815 between its putative and native SA values. AlphaFold2 significantly and consistently outperforms the considered predictors of SA and SS across the five taxonomic groups and both AA and region level evaluations. Moreover, we demonstrated that AlphaFold2 nearly perfectly reconstructs distributions of the sizes and numbers of the SS regions. We also showed that AlphaFold2 substantially improves over the SS and SA predictors when tested on a low sequence similarity test dataset, although its results and results of two other predictors suffer a modest drop in the quality of predicting SS regions. Altogether, our results suggest that AlphaFold2 makes very accurate predictions of SS and SA, which can be easily extracted from 200+ million pre-computed AF2's structure predictions in AlphaFoldDB.
more » « less
Free, publicly-accessible full text available May 29, 2026
Comparative assessment of binding residue predictions in intrinsically disordered regions

https://doi.org/10.1002/pro.70298

Basu, Sushmita; Kurgan, Lukasz (October 2025, Protein Science)

Abstract Dozens of impactful methods that predict intrinsically disordered regions (IDRs) in protein sequences that interact with proteins and/or nucleic acids were developed. Their training and assessment rely on the IDR‐level binding annotations, while the equivalent structure‐trained methods predict more granular annotations of binding amino acids (AA). We compiled a new benchmark dataset that annotates binding AA in IDRs and applied it to complete a first‐of‐its‐kind assessment of predictions of the disordered binding residues. We evaluated a representative collection of 14 methods, used several hundred low‐similarity test proteins, and focused on the challenging task of differentiating these binding residues from other disordered AA and considering ligand type‐specific predictions (protein–protein vs. protein–nucleic acid interactions). We found that current methods struggle to accurately predict binding IDRs among disordered residues; however, better‐than‐random tools predict disordered binding residues significantly better than binding IDRs. We identified at least one relatively accurate tool for predicting disordered protein‐binding and disordered nucleic acid‐binding AA. Analysis of cross‐predictions between interactions with protein and nucleic acids revealed that most methods are ligand‐type‐agnostic. Only two predictors of the nucleic acid‐binding IDRs and two predictors of the protein‐binding IDRs can be considered as ligand‐type‐specific. We also discussed several potential future directions that would move this field forward by producing more accurate methods that target the prediction of binding residues, reduce cross‐predictions, and cover a broader range of ligand types.
more » « less
Free, publicly-accessible full text available October 1, 2026
Two decades of advances in sequence-based prediction of MoRFs, disorder-to-order transitioning binding regions

https://doi.org/10.1080/14789450.2025.2451715

Song, Jiangning; Kurgan, Lukasz (January 2025, Expert Review of Proteomics)

Full Text Available
Evaluation of predictions of disordered binding regions in the CAID2 experiment

https://doi.org/10.1016/j.csbj.2024.12.009

Zhang, Fuhao; Kurgan, Lukasz (January 2025, Computational and Structural Biotechnology Journal)

Full Text Available
Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses

https://doi.org/10.1016/j.csbj.2024.04.059

Basu, Sushmita; Kurgan, Lukasz (December 2024, Computational and Structural Biotechnology Journal)

Full Text Available
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences

https://doi.org/10.1093/bib/bbaf016

Basu, Sushmita; Yu, Jing; Kihara, Daisuke; Kurgan, Lukasz (January 2025, Briefings in Bioinformatics)

Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.
more » « less
MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning

https://doi.org/10.1016/j.jmb.2024.168872

Zhang, Jian; Basu, Sushmita; Zhang, Fuhao; Kurgan, Lukasz (November 2024, Journal of Molecular Biology)

Full Text Available
flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins

https://doi.org/10.1016/j.jmb.2024.168605

Wang, Kui; Hu, Gang; Basu, Sushmita; Kurgan, Lukasz (May 2024, Journal of Molecular Biology)

Full Text Available

« Prev Next »

Search for: All records